NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Opening Doors to Physical Sample Data Discovery, Integration, and Credit

https://doi.org/10.31223/X5ST2K

Damerow, Joan; Raia, Natalie; Stanley, Val; Choe, Saebyul; Borton, Mikayla; Byers, Neil; Cassidy, Ellen; Cholia, Shreyas; Edmunds, Rorie; Forbes, Brieanne; et al (June 2024, Nature Scientific Data)

Physical samples and their associated (meta)data underpin scientific discoveries across disciplines, and can enable new science when appropriately archived. However, there are significant gaps in community practices and infrastructure that currently prevent accurate provenance tracking, reproducibility, and attribution. For the vast majority of samples, descriptive metadata is often sparse, inaccessible, or absent. Samples and associated (meta)data may also be scattered across numerous physical collections, data repositories, laboratories, data files, and papers with no clear linkages or provenance tracking as new information is generated over time. The Physical Samples Curation Cluster has therefore developed ‘A Scientific Author Guide for Publishing Open Research Using Physical Samples.’ This involved synthesizing existing practices, community feedback, and assessing real-world examples to identify community and infrastructure needs. We identified areas of work needed to enable authors to efficiently reference samples and related data, link related samples and data, and track their use. Our goal is to help improve the discoverability, interoperability, use of physical samples and associated (meta)data into the future.
more » « less
Full Text Available
iVirus 2.0: Cyberinfrastructure-supported tools and data to power DNA virus ecology

https://doi.org/10.1038/s43705-021-00083-3

Bolduc, Benjamin; Zablocki, Olivier; Guo, Jiarong; Zayed, Ahmed A.; Vik, Dean; Dehal, Paramvir; Wood-Charlson, Elisha M.; Arkin, Adam; Merchant, Nirav; Pett-Ridge, Jennifer; et al (December 2021, ISME Communications)

Abstract Microbes drive myriad ecosystem processes, but under strong influence from viruses. Because studying viruses in complex systems requires different tools than those for microbes, they remain underexplored. To combat this, we previously aggregated double-stranded DNA (dsDNA) virus analysis capabilities and resources into ‘iVirus’ on the CyVerse collaborative cyberinfrastructure. Here we substantially expand iVirus’s functionality and accessibility, to iVirus 2.0, as follows. First, core iVirus apps were integrated into the Department of Energy’s Systems Biology KnowledgeBase (KBase) to provide an additional analytical platform. Second, at CyVerse, 20 software tools (apps) were upgraded or added as new tools and capabilities. Third, nearly 20-fold more sequence reads were aggregated to capture new data and environments. Finally, documentation, as “live” protocols, was updated to maximize user interaction with and contribution to infrastructure development. Together, iVirus 2.0 serves as a uniquely central and accessible analytical platform for studying how viruses, particularly dsDNA viruses, impact diverse microbial ecosystems.
more » « less
Planet Microbe: a platform for marine microbiology to discover and analyze interconnected ‘omics and environmental data

https://doi.org/10.1093/nar/gkaa637

Ponsero, Alise J; Bomhoff, Matthew; Blumberg, Kai; Youens-Clark, Ken; Herz, Nina M; Wood-Charlson, Elisha M; Delong, Edward F; Hurwitz, Bonnie L (July 2020, Nucleic Acids Research)
null (Ed.)
Abstract In recent years, large-scale oceanic sequencing efforts have provided a deeper understanding of marine microbial communities and their dynamics. These research endeavors require the acquisition of complex and varied datasets through large, interdisciplinary and collaborative efforts. However, no unifying framework currently exists for the marine science community to integrate sequencing data with physical, geological, and geochemical datasets. Planet Microbe is a web-based platform that enables data discovery from curated historical and on-going oceanographic sequencing efforts. In Planet Microbe, each ‘omics sample is linked with other biological and physiochemical measurements collected for the same water samples or during the same sample collection event, to provide a broader environmental context. This work highlights the need for curated aggregation efforts that can enable new insights into high-quality metagenomic datasets. Planet Microbe is freely accessible from https://www.planetmicrobe.org/.
more » « less
Full Text Available
STREAMS guidelines: standards for technical reporting in environmental and host-associated microbiome studies

https://doi.org/10.1038/s41564-025-02186-2

Kelliher, Julia M; Mirzayi, Chloe; Bordenstein, Sarah R; Oliver, Aaron; Kellogg, Christina A; Hatcher, Eneida L; Berg, Maureen; Baldrian, Petr; Aljumaah, Mashael; Miller, Cassandra_Maria Luz; et al (December 2025, Nature Microbiology)

Free, publicly-accessible full text available December 1, 2026
The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes

https://doi.org/10.1093/nar/gkaa746

Seaver, Samuel M; Liu, Filipe; Zhang, Qizhi; Jeffryes, James; Faria, José P; Edirisinghe, Janaka N; Mundy, Michael; Chia, Nicholas; Noor, Elad; Beber, Moritz E; et al (September 2020, Nucleic Acids Research)

Abstract For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org/biochem and KBase.
more » « less
Full Text Available
A genomic catalog of Earth’s microbiomes

https://doi.org/10.1038/s41587-020-0718-6

Nayfach, Stephen; Roux, Simon; Seshadri, Rekha; Udwary, Daniel; Varghese, Neha; Schulz, Frederik; Wu, Dongying; Paez-Espino, David; Chen, I-Min; Huntemann, Marcel; et al (November 2020, Nature Biotechnology)
null (Ed.)
Abstract The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes.
more » « less
Full Text Available
Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative’s Workshop and Follow-On Activities

https://doi.org/10.1128/mSystems.01194-20

Vangay, Pajau; Burgin, Josephine; Johnston, Anjanette; Beck, Kristen L.; Berrios, Daniel C.; Blumberg, Kai; Canon, Shane; Chain, Patrick; Chandonia, John-Marc; Christianson, Danielle; et al (February 2021, mSystems)
Bucci, Vanni (Ed.)
Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical.
more » « less
Full Text Available
A roadmap for the functional annotation of protein families: a community perspective

https://doi.org/10.1093/database/baac062

de Crécy-lagard, Valérie; Amorin de Hegedus, Rocio; Arighi, Cecilia; Babor, Jill; Bateman, Alex; Blaby, Ian; Blaby-Haas, Crysten; Bridge, Alan J.; Burley, Stephen K.; Cleveland, Stacey; et al (August 2022, Database)

Abstract Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
more » « less

Search for: All records